gut = read.csv("gut_diversity.csv", header=TRUE)
gut$age = scale(gut$age)
gut$exercise = scale(gut$exercise)
gut$weight = scale(gut$weight)
gut$subject = as.factor(gut$subject)
summary(gut)
## subject diversity age.V1 exercise.V1
## 1 : 1 Min. :0.420 Min. :-1.5018075 Min. :-1.2373974
## 2 : 1 1st Qu.:2.777 1st Qu.:-0.8431200 1st Qu.:-0.8540849
## 3 : 1 Median :4.255 Median :-0.1844325 Median :-0.0928588
## 4 : 1 Mean :4.688 Mean : 0.0000000 Mean : 0.0000000
## 5 : 1 3rd Qu.:5.992 3rd Qu.: 0.7706644 3rd Qu.: 0.4092265
## 6 : 1 Max. :9.460 Max. : 1.9233674 Max. : 2.0450529
## (Other):14
## weight.V1
## Min. :-1.3732613
## 1st Qu.:-0.8364117
## Median :-0.1599812
## Mean : 0.0000000
## 3rd Qu.: 0.6936097
## Max. : 1.9122584
##
Ancova is anova plus regression analysis.
Fit a linear model using lm(), with both IVs but no interaction. Look at the intercept and slopes.
model.both_iv = lm(diversity ~ age + exercise, data=gut)
summary(model.both_iv)
##
## Call:
## lm(formula = diversity ~ age + exercise, data = gut)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.14565 -1.18318 0.02118 0.93561 2.62092
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.68750 0.32238 14.540 5.07e-11 ***
## age 0.04636 0.33732 0.137 0.892
## exercise 2.39877 0.33732 7.111 1.75e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.442 on 17 degrees of freedom
## Multiple R-squared: 0.7572, Adjusted R-squared: 0.7286
## F-statistic: 26.51 on 2 and 17 DF, p-value: 5.951e-06
Fit a separate simple linear regression model using lm() for each IV. Compare the intercept and slopes to the multiple regression model.
model.age_only = lm(diversity ~ age, data=gut)
model.exercise_only = lm(diversity ~ exercise, data=gut)
summary(model.age_only)
##
## Call:
## lm(formula = diversity ~ age, data = gut)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4892 -2.0048 -0.5452 1.3581 5.1851
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.6875 0.6246 7.505 6.01e-07 ***
## age 0.5175 0.6408 0.807 0.43
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.793 on 18 degrees of freedom
## Multiple R-squared: 0.03496, Adjusted R-squared: -0.01866
## F-statistic: 0.652 on 1 and 18 DF, p-value: 0.4299
summary(model.exercise_only)
##
## Call:
## lm(formula = diversity ~ exercise, data = gut)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.11174 -1.16199 0.00008 0.89369 2.60417
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.6875 0.3135 14.954 1.36e-11 ***
## exercise 2.4079 0.3216 7.487 6.22e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.402 on 18 degrees of freedom
## Multiple R-squared: 0.7569, Adjusted R-squared: 0.7434
## F-statistic: 56.05 on 1 and 18 DF, p-value: 6.217e-07
Intercept is same for both. Slope changes.
Simple linear regression of diversity (DV) on weight (IV). Look at significance, either with summary() or anova()
model.weight_only = lm(diversity ~ weight, data=gut)
summary(model.weight_only)
##
## Call:
## lm(formula = diversity ~ weight, data = gut)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2086 -0.9025 -0.0632 0.6016 3.2008
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.6875 0.4094 11.449 1.07e-09 ***
## weight -2.1175 0.4200 -5.041 8.49e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.831 on 18 degrees of freedom
## Multiple R-squared: 0.5854, Adjusted R-squared: 0.5623
## F-statistic: 25.41 on 1 and 18 DF, p-value: 8.494e-05
Multiple linear regression of diversity (DV) on exercise (IV1) and weight (IV2), no interaction. Look at significance of weight now.
model.ex_wt = lm(diversity ~ weight + exercise, data = gut)
summary(model.ex_wt)
##
## Call:
## lm(formula = diversity ~ weight + exercise, data = gut)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.99210 -0.92707 0.05392 0.86199 2.50362
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.6875 0.3191 14.689 4.31e-11 ***
## weight -0.3606 0.5930 -0.608 0.55118
## exercise 2.1073 0.5930 3.554 0.00244 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.427 on 17 degrees of freedom
## Multiple R-squared: 0.7621, Adjusted R-squared: 0.7341
## F-statistic: 27.23 on 2 and 17 DF, p-value: 5.004e-06
Adding exercise changed the contribution of weight significantly. Looks like exercise complicated things a bit.
Plot weight v. exercise to see what’s going on.
ggplot(
data=gut,
aes(
x = weight,
y = exercise
)
) +
geom_point()
cor(gut$weight, gut$exercise)
## [,1]
## [1,] -0.8337587
Looks like there’s a correlation between the 2 IVs. That explains what we’re seeing with the diminished effect of weight.
One option: just drop one of the variables. Another option: model the interaction
# pairs just shows a plot of each variable against each other. Sort of
# nice for quick visualization.
pairs(~age+exercise+weight, data=gut)
# pairs.panels (relies on psych package, can't handle ~ inputs) is also cool!
pairs.panels(gut[ ,3:5])
Fit the gut OTU Shannon diversity model with an interaction (DV = diversity, IV = age, exercise, and their interaction).
model.age.ex = lm(diversity ~ age * exercise, data=gut)
Test the IVs. There is an interaction, so type III Anova() would probably be best so age * exercise doesn’t give different results than exercise * age… But good news, if ALL your IVs are continuous, you don’t have to worry about changing the contrasts statement. - type III Anova() and summary() give same p-values - anova() is still order-dependent unless perfectly balanced (very unlikely with continuous IVs)
Anova(model.age.ex, type=3)
## Anova Table (Type III tests)
##
## Response: diversity
## Sum Sq Df F value Pr(>F)
## (Intercept) 439.37 1 228.8487 6.728e-11 ***
## age 0.00 1 0.0014 0.9706
## exercise 109.56 1 57.0647 1.157e-06 ***
## age:exercise 4.62 1 2.4048 0.1405
## Residuals 30.72 16
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model.age.ex)
##
## Call:
## lm(formula = diversity ~ age * exercise, data = gut)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6555 -1.0236 -0.1549 0.9502 2.3121
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.78748 0.31647 15.128 6.73e-11 ***
## age 0.01216 0.32494 0.037 0.971
## exercise 2.52488 0.33424 7.554 1.16e-06 ***
## age:exercise -0.53591 0.34559 -1.551 0.141
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.386 on 16 degrees of freedom
## Multiple R-squared: 0.7889, Adjusted R-squared: 0.7493
## F-statistic: 19.93 on 3 and 16 DF, p-value: 1.185e-05
I know the interaction is not significant, but pretend that it is. How would you describe, in words (not numbers) the OTU diversity relationship with age and exercise? Include your description in the text of your Rmd file.
Gut microbe diversity has a string, positive correlation with exercise. There is no statistically significant correlation between age and microbe diversity or between microbe diversity and the interaction between age and exercise.
It is hard to plot when you have >1 continuous IV. Try to come up with a way to do this.
gut2 = read.csv("gut_diversity.csv", header=TRUE)
fig <- plot_ly(
type="scatter3d",
mode="markers",
gut2,
x = ~weight,
y = ~age,
z = ~exercise,
color = ~diversity
)
fig